Methods

ROC with corresponding AUC for inferred GBM networks compared to BioGrid interactions

ROC with corresponding AUC for inferred GBM networks compared to BioGrid interactions

Epigenetic analysis

The DNA methylation is an important component in numerous cellular processes, such as embryonic development, genomic imprinting, X-chromosome inactivation, and preservation of chromosome stability (Phillips 2008).

In mammals DNA methylation is found sparsely but globally, distributed in definite CpG sequences throughout the entire genome; however, there is an exception. CpG islands (CGIs) which are short interspersed DNA sequences that are enriched for GC. These islands are normally found in sites of transcription initiation and their methylation can lead to gene silencing (Deaton and Bird 2011).

Thus, the investigation of the DNA methylation is crucial to understanding regulatory gene networks in cancer as the DNA methylation represses transcription (Robertson 2005). Therefore, the DMR (Differentially Methylation Region) detection can help us investigate regulatory gene networks.

This section describes the analysis of DNA methylation using the Bioconductor package TCGAbiolinks (Colaprico et al. 2016). For this analysis, and due to the time required to perform it, we selected only 10 LGG samples and 10 GBM samples that have both DNA methylation data from Infinium HumanMethylation450 and gene expression from Illumina HiSeq 2000 RNA Sequencing Version 2 analysis (lines 1-56 of the listing below describes how to make the data acquisition). We started by checking the mean DNA methylation of different groups of samples, then performed a DMR in which we search for regions of possible biological significance, (e.g., regions that are methylated in one group and unmethylated in the other). After finding these regions, they can be visualized using heatmaps.

Visualizing the mean DNA methylation of each patient

It should be highlighted that some pre-processing of the DNA methylation data was done. The DNA methylation data from the 450k platform has three types of probes cg (CpG loci) , ch (non-CpG loci) and rs (SNP assay). The last type of probe can be used for sample identification and tracking and should be excluded for differential methylation analysis according to the ilumina manual. Therefore, the rs probes were removed (see listing below lines 68). Also, probes in chromosomes X, Y were removed to eliminate potential artifacts originating from the presence of a different proportion of males and females (Marabita et al. 2013). The last pre-processing steps were to remove probes with at least one NA value (see listing below lines 65).

After this pre-processing step and using the function TCGAvisualize_meanMethylation function, we can look at the mean DNA methylation of each patient in each group. It receives as argument a SummarizedExperiment object with the DNA methylation data, and the arguments groupCol and subgroupCol which should be two columns from the sample information matrix of the SummarizedExperiment object (accessed by the colData function) (see listing below lines 70-74).

#----------------------------
# Obtaining DNA methylation
#----------------------------
# Samples
lgg.samples <- matched_met_exp("TCGA-LGG", n = 10)
gbm.samples <- matched_met_exp("TCGA-GBM", n = 10)
samples <- c(lgg.samples,gbm.samples)

#-----------------------------------
# 1 - Methylation
# ----------------------------------
# For methylation it is quicker in this case to download the tar.gz file
# and get the samples we want instead of downloading files by files
query <- GDCquery(project = c("TCGA-LGG","TCGA-GBM"),
                  data.category = "DNA methylation",
                  platform = "Illumina Human Methylation 450",
                  legacy = TRUE, 
                  barcode = samples)
GDCdownload(query)
met <- GDCprepare(query, save = FALSE)

# We will use only chr9 to make the example faster
met <- subset(met,subset = as.character(seqnames(met)) %in% c("chr9"))
# This data is avaliable in the package
save("met.20.samples.GBM.LGG.chr9.rda")
library(TCGAWorkflowData)
data("met.20.samples.GBM.LGG.chr9")
#----------------------------
# Mean methylation
#----------------------------
# Plot a barplot for the groups in the disease column in the
# summarizedExperiment object

# remove probes with NA (similar to na.omit)
met <- subset(met,subset = (rowSums(is.na(assay(met))) == 0))

TCGAvisualize_meanMethylation(met,
                              groupCol = "disease_type",
                              group.legend  = "Groups",
                              filename = "mean_lgg_gbm.png",
                              print.pvalue = TRUE)
##                     groups      Mean    Median       Max       Min
## 1 Brain Lower Grade Glioma 0.5159918 0.5320007 0.5610239 0.4509977
## 2  Glioblastoma Multiforme 0.4600086 0.4780980 0.5186596 0.3060219
##                          Brain Lower Grade Glioma Glioblastoma Multiforme
## Brain Lower Grade Glioma                       NA              0.02181625
## Glioblastoma Multiforme                0.02181625                      NA

The figure below illustrates a mean DNA methylation plot for each sample in the GBM group (140 samples) and a mean DNA methylation for each sample in the LGG group. Genome-wide view of the data highlights a difference between the groups of tumors.

Boxplot of mean DNA methylation of each sample (black dots)

Boxplot of mean DNA methylation of each sample (black dots)

Deaton, Aimée M, and Adrian Bird. 2011. “CpG Islands and the Regulation of Transcription.” Genes & Development 25 (10). Cold Spring Harbor Lab: 1010–22.

Marabita, Francesco, Malin Almgren, Maléne E Lindholm, Sabrina Ruhrmann, Fredrik Fagerström-Billai, Maja Jagodic, Carl J Sundberg, et al. 2013. “An Evaluation of Analysis Pipelines for Dna Methylation Profiling Using the Illumina Humanmethylation450 Beadchip Platform.” Epigenetics 8 (3). Taylor & Francis: 333–46.

Phillips, Theresa. 2008. “The Role of Methylation in Gene Expression.” Nature Education 1 (1): 116.

Robertson, Keith D. 2005. “DNA Methylation and Human Disease.” Nature Reviews Genetics 6 (8). Nature Publishing Group: 597–610.